Non-Unique joins and Concurrent Cursors

Non-unique joins

SQLs can get complex when you want to join tables with non-unique join keys. Consider a financials system with tables account, payment, and receipt. An account can have many payments and many receipts. Any query that tries to select payments and receipts together will obtain a cross product of the payments and receitps for each account.

For example:

SELECT a.account_no, sum(payment_amt), sum(receipt_amt)
FROM   account a, payment p, receipt r
WHERE  a.account_no = p.account_no
AND    a.account_no = r.account_no
GROUP BY a.account_no;

This SQL is not going to work. For a given account with 5 payments and 3 receipts, the join would produce 5x3=15 rows before the GROUP BY aggregates them to one. The payment total will be 3 times higher than it should be, and the receipt total will be 5 times higher.

One way to resolve this using Oracle prior to v7.3 is as follows

SELECT a.account_no
,      sum(payment_amt) / count(distinct r.rowid)
,      sum(receipt_amt) / count(distinct p.rowid)
FROM   account a, payment p, receipt r
WHERE  a.account_no = p.account_no
AND    a.account_no = r.account_no
GROUP BY a.account_no;

Although this seems very clever, it is inefficient and difficult to read - particularly if extended to three tables. A better solution is:

SELECT a.account_no, p.payment_total, r.receipt_total
FROM   account a
,(     SELECT account_no, sum(receipt_amt) as receipt_total
       FROM   receipt
       GROUP BY account_no) r
,(     SELECT account_no, sum(payment_amt) as payment_total
       FROM   payment
       GROUP BY account_no) p
WHERE  a.account_no = r.account_no
AND    a.account_no = p.account_no
GROUP BY a.account_no;

What happens if it is not this simple? Above, the non-unique joins have been made unique using aggregation. What if that is not possible?

Concurrent Cursors

Consider a system with tables job, job_status_hist, and invoice. job_status_hist records the changes in status of a job over time - eg. (O)pen, (C)losed, (R)eopened. A job may have many invoices which are sent out periodically over the duration of a job. A report is required that shows the initial cost of a job (invoices sent between the original Open status record, and the first Closed status record), and the total cost of a job (total of all invoices).

This is quite possible to produce in SQL, but would probably be large and convoluted and difficult to read. Furthermore, minor changes in the business rule (eg. special coding to handle jobs closed in error and immediately reopened) would be difficult to handle. The logical solution is to code the report procedurally. An algorithm might be:

job

job_status_hist

invoice

Simple to code and simple to read. But very inefficient if the number of jobs is large. The SQLs or Cursors required to get the dates and costs inside the loop must be executed separately for every job.

A better way to do this is Concurrent Cursors. Instead of running independent SQLs inside the loop for every job, we open three cursors at the beginning - one from each table - and process them in job number order.

For example:

job

job_status_hist

invoice

This technique has allowed us to process high volumes of data transactionally without resorting to indexes. It is only marginally less efficient than a single SQL performing Hash or Sort-Merge joins (not that it would be possible in this example).